pdet 02
Prominence-aware automatic speech recognition for conversational speech
Linke, Julian, Schuppler, Barbara
This paper investigates prominence-aware automatic speec h recognition (ASR) by combining prominence detection and speech recognition for conversational Austrian German. Fi rst, prominence detectors were developed by fine-tuning wav2vec 2 models to classify word-level prominence. The detector was then used to automatically annotate prosodic prominence in a large corpus. Based on those annotations, we trained novel prominence-aware ASR systems that simultaneously transcr ibe words and their prominence levels. The integration of promi - nence information did not change performance compared to our baseline ASR system, while reaching a prominence detection accuracy of 85.53% for utterances where the recognized word sequence was correct. This paper shows that transforme r-based models can effectively encode prosodic information a nd represents a novel contribution to prosody-enhanced ASR, with potential applications for linguistic research and pr osody-informed dialogue systems.